首页> 外文OA文献 >Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams
【2h】

Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

机译:分层抽样:一种有效的近似计数稀疏方法   大规模图流中的图案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce Tiered Sampling, a novel technique for approximate countingsparse motifs in massive graphs whose edges are observed in a stream. Ourtechnique requires only a single pass on the data and uses a memory of fixedsize $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs -sub-graph patterns that have low probability to appear in a sample of $M$ edgesin the graph, which is the maximum amount of data available to the algorithmsin each step. To obtain an unbiased and low variance estimate of the count wepartition the available memory to tiers (layers) of reservoir samples. Whilethe base layer is a standard reservoir sample of edges, other layers arereservoir samples of sub-structures of the desired motif. By storing morefrequent sub-structures of the motif, we increase the probability of detectingan occurrence of the sparse motif we are counting, thus decreasing the varianceand error of the estimate. We demonstrate the advantage of our method in the specific applications ofcounting sparse 4 and 5-cliques in massive graphs. We present a completeanalytical analysis and extensive experimental results using both synthetic andreal-world data. Our results demonstrate the advantage of our method inobtaining high-quality approximations for the number of 4 and 5-cliques forlarge graphs using a very limited amount of memory, significantly outperformingthe single edge sample approach for counting sparse motifs in large scalegraphs.
机译:我们介绍了分层采样,这是一种用于在大量图形中近似计数稀疏主题的新技术,该图形在流中观察到其边缘。我们的技术只需要对数据进行一次遍历,并使用固定大小的$ M $的内存,大小可能比边的数量小。我们的方法解决了对稀疏主题进行计数的艰巨任务-子图形模式出现在图形中$ M $边缘样本中的可能性很小,这是每个步骤中算法可用的最大数据量。为了获得计数的无偏和低方差估计,我们将可用内存划分为储层样本的层(层)。虽然基础层是边缘的标准储层样本,但其他层是所需基序的子结构的储层样本。通过存储图案的更频繁的子结构,我们增加了检测正在计数的稀疏图案发生的可能性,从而减小了估计的方差和误差。我们证明了我们的方法在大规模图中对稀疏4和5奇数进行计数的特定应用中的优势。我们使用合成数据和真实数据提供了完整的分析分析和广泛的实验结果。我们的结果证明了我们的方法的优势,即使用非常有限的内存即可获得大型图的4个和5个顶点的高质量近似值,大大优于单边样本方法来计算大型图的稀疏主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号